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Amendments to the Claims : 
This listing of claims replaces all prior versions and listings of claims in the application: 

Listing of Claims : 

1 . (currently amended) A method for selecting segments from a corpus of source 
utterances for synthesizing a target utterance, comprising: 

searching a graph in which each path through the graph identifies a sequence of segments 
of the corpus of source utterances and a corresponding sequence of unit labels that characterizes 
a pronunciation of a concatenation of that sequence of segments, each path being associated with 
a numerical score that characterizes a quality of the sequence of segments; 

wherein searching the graph includes matching a pronunciation of the target utterance to 
paths through the graph, and selecting segments for synthesizing the target utterance based on 
numerical scores of matching paths through the graph. 

2. (original) The method of claim 1 wherein selecting segments for synthesizing the 
target utterance includes identifying a path through the graph that matches the pronunciation of 
the target utterance and selecting the sequence of segments that is identified by the determined 
path. 

3. (original) The method of claim 2 wherein determining the path includes 
determining a best scoring path through the graph. 

4. (original) The method of claim 3 wherein determining the best scoring path 
involves using a dynamic programming algorithm. 



Applicant : Jon Rong-Wei Yi et al. Attorney's Docket No.: 01997-294001 / Case No. 8973 

Serial No. : 09/954,979 

Filed : September 17, 2001 

Page : 3 of 9 



5. (original) The method of claim 2 further comprising concatenating the selected 
sequence of segments to form a waveform representation of the target utterance. 

6. (original) The method of claim 1 wherein selecting the segments for synthesizing 
the target utterance includes determining a plurality of paths through the graph that each matches 
the representation of the pronunciation of the target utterance. 

7. (original) The method of claim 6 wherein selecting the segments further includes 
forming a plurality of sequences of segments, each associated with a different one of the plurality 
of paths. 

8. (original) The method of claim 7 wherein selecting the segments further includes 
selecting one of the sequences of segments based on characteristics of those sequences of 
segments not determined by the corresponding sequences of unit labels associated with those 
sequences. 

9. (original) The method of claim 1 further comprising forming a representation of a 
plurality of pronunciations of the target utterance, and wherein searching the graph includes 
matching any of the pronunciations of the target utterance to paths through the graph. 

10. (original) The method of claim 1 further comprising forming a representation of 
the pronunciation of the target utterance in terms of alternating unit labels and transitions labels. 

11. (original) The method of claim 1 wherein the graph includes a first part that 
encodes a sequence of segments and a corresponding sequence of unit labels for each of the 
source utterances, and a second part that encodes allowable transitions between segments of 
different source utterances and encodes a transition score for each of those transitions; and 
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matching the pronunciation of the target utterance to paths through the graph includes 
considering paths in which each transition between segments of different source utterances 
identified by that path corresponds to a different subpath of that path that passes through the 
second part of the graph. 

12. (original) The method of claim 10, wherein selecting the segments for synthesis 
includes evaluating a score for each of the considered paths that is based on the transition scores 
associated with the subpaths through the second part of the graph. 

13. (original) The method of claim 10 wherein a size of the second part of the graph is 
substantially independent of a size of the source corpus, and a complexity of matching the 
pronunciation through the graph grows less than linearly with the size of the corpus. 

14. (original) The method of claim 1 further comprising: 

providing the corpus of source utterances, each source utterance being segmented into a 
sequence of segments, each consecutive pair of segments in a source utterance forming a 
segment boundary, and each speech segment being associated with a unit label and each segment 
boundary being associated with a transition label; and 

forming the graph, including forming a first part of the graph that encodes a sequence of 
segments and a corresponding sequence of unit labels for each of the source utterances, and 
forming a second part that encodes allowable transitions between segments of different source 
utterances and encodes a transition score for each of those transitions. 

15. (original) The method of claim 14 wherein forming the second part of the graph is 
performed independently of the utterances in the corpus of source utterances. 

16. (original) The method of claim 14 further comprising: 
augmenting the corpus of source utterances with additional utterances; and 



Applicant : Jon Rong-Wei Yi et al. Attorney's Docket No.: 01997-294001 / Case No. 8973 

Serial No. : 09/954,979 

Filed : September 17, 2001 

Page : 5 of 9 



augmenting the graph including augmenting the first part of the graph to encode the 
additional utterances, and linking the augmented first part to the second part without modifying 
the second part based on the additional utterances. 

17. (original) The method of claim 1 wherein the graph is associated with a finite- 
state transducer which accepts input symbols that include unit labels and transition labels, and 
that produces identifiers of segments of the source utterances, and wherein searching the graph is 
equivalent to composing a finite-state transducer representation of a pronunciation of the target 
utterance with the finite-state transducer with which the graph is associated, 

18. (currently amended) Software stored on a computer-readable medium for causing 
a computer to perform functions comprising selecting segments from a corpus of source 
utterances for synthesizing a target utterance, wherein selecting the segments comprises: 

searching a graph in which each path through the graph identifies a sequence of segments 
of the corpus of source utterances and a corresponding sequence of unit labels that characterizes 
a pronunciation of a concatenation of that sequence of segments, each path being associated with 
a numerical score that characterizes a quality of the sequence of segments; 

wherein searching the graph includes matching a pronunciation of the target utterance to 
paths through the graph, and selecting segments for synthesizing the target utterance based on 
numerical scores of matching paths through the graph. 



